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Abstract 

The quantum relative entropy S{p\ \o~) is a widely used dissimilarity measure between 
quantum states, but it has the peculiarity of being asymmetric in its arguments. We 
quantify the amount of asymmetry by providing a sharp upper bound in terms of 
two parameters: the trace norm distance between the two states, and the smallest 
of the smallest eigenvalues of both states. The bound is essentially the asymmetry 
between two binary distributions governed by these two parameters. 
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The quantum relative entropy between two quantum states p and a, S(p\\cr) = 
Trp(logp— loga), is a non-commutative generalisation of the Kullback-Leibler 
divergence (KLD) Dkl{p\\c[) between probability distributions p and q. 

Just as the KLD, the relative entropy is not a true metric distance, first and 
foremost because it is not symmetric in its arguments. In essence, this asym- 
metry is not a deficiency but a feature, arising from the inherent asymmetry 
in the mathematical models from which both concepts emerge. For example, 
Dkl{p\\i) can be interpreted as the number of extra bits required to encode 
a bitstream assuming it comes from a source with distribution q, where in 
fact the source is governed by distribution p. In the setting of hypothesis test- 
ing, where under hypothesis Ho a random variable is distributed according 
to p and under hypothesis Hi according to q, DxLipWq) can be interpreted 
as the expected 'weight of evidence' per sample in favour of H\ and against 
Hq. To clarify the asymmetry here, we simply quote F. Bavaud, who wrote, 
paraphrasing Popper PQ: 
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"The theory 'All crows are black' is refuted by the single observation of a 
white crow, while the theory 'Some crows are black' is not refuted by the 
observation of a thousand white crows." 

Be this as it may, the quantum relative entropy is widely used as a quantitative 
measure of the dissimilarity between two quantum states [5] , not in the least 
because of its simplicity, its clear information theoretical meaning, and its 
nice mathematical properties. In these applications, the asymmetry is just 
considered part of the price to be paid. 

If one does not wish to pay this price, one way out is to replace the relative 
entropy by a symmetrisation [I]. The symmetrised KL divergence is known as 
the Jeffreys divergence, or J-divergence: 

J(p,q) = D KL (p\\q) + D KL (q\\p). 

Likewise, one can define a quantum J-divergence as 

J(p,a) = S(p\\a) + S(a\\p). 

The question addressed in this paper is: how much can the quantum J- 
divergence differ from the quantum relative entropy? Or, asked differently, 
how great can the asymmetry in the quantum relative entropy be? It is well- 
known that in the infinitesimal limit, for distributions that are infinitesimally 
close, the KLD becomes a true metric, its Hessian being known as the Fisher 
information metric, and the same can be said about the quantum relative 
entropy. For states that are sufficiently close, we can therefore expect the 
asymmetry to be small. 

To make this statement more precise, we will first look at the simplest example 
of two binary distributions, (p, 1 — p) and (q, 1 — q). Let us denote the KLD 
between these two binary distributions by the function 

s 2 (p\\q) :=phg(p/q) + (1 - p) log((l - p)/(l - q)). 

A graph of this function is shown in Fig. [TJ along with a graph of its asymme- 
try S2(q\\p) — szipWq). The statement about the smallness of the asymmetry 
is partially corroborated by the presence of a relatively flat 'plateau' in the 
middle of the graph. However, one also notices that for very small values of 
por 1 — p, the values of p and q have to be much closer together to keep the 
asymmetry small. 

The asymmetry can be expressed in terms of the difference t : = q — p by the 
function 

a{p,t):=s 2 (p + t\\p)-s 2 {p\\p + t) (1) 




Fig. 1. Kullback-Leibler distance between two binary distributions (p, 1 — p) and 
(q, 1 — q), and its asymmetry. 




Fig. 2. Graph of the asymmetry function a(p,t). 
(2p + t) log (l + ^ J + (2(1 -p)-t) log (l - j±— 



(2) 



which is defined for — 1 < t < 1 and max(0, —t) < p < min(l, 1 — t) (see Fig. 
[2]). A more qualitative statement about the flatness of this function can be 
made by considering the Taylor series expansion of a(p, t) as a function of t, 

a(p,t) = (p- 2 - (1-p)- 2 ) | - (p- 3 - (1-p)- 3 ) j + 0(t 5 ). 



and noticing that the leading term is of order 3 in t. 

The main technical contribution of the present paper is that the situation 
just considered for binary distributions is essentially universal and holds for 
distributions and quantum states of any (finite) dimension, provided the pa- 
rameter p in the asymmetry function a(p, t) is replaced by the smallest of 
the smallest eigenvalues of p and a, and the parameter t is replaced by 
the trace norm distance T between p and a. Then the absolute value of 
the asymmetry function \a(z, T)\ is a sharp upper bound on the asymmetry 
A(p\\a):=\S(a\\ P )-S(p\\a)\. 



2 Main Results 



The bounds we prove here can be conveniently expressed using the function 
a(p,t) defined in the Introduction. 

Theorem 1 Let p and a be two density matrices, with trace distance T = 
\\p — a\\i/2 and\ m m(a)—x. Then 

S(p\\a)-S(a\\p)<a(x,T). (3) 



The highly technical proof of this theorem is postponed to the last section. 

Corollary 1 Let p and a be two density matrices, with T = \\p — a\\i/2, 
Amin(o-) = x, and A min (p) = y. Then 

\S(p\\a)-S(a\\p)\<a(mm(x,y),T). (4) 



Proof. As \x\ = max(x, — x), an upper bound on A(p,a) := \S(p\\a) — S(a\\p)\ 
is given by the pointwise maximum of the bounds A(p,a) < a(x,T) and 
A{cr,p) < a(y,T), where the latter is obtained from the former by swapping 
the roles of p and o . The statement follows from the facts that A(p, a) is 
symmetric in its arguments and that a(x, T) is strictly decreasing in x for any 
fixed Te [0,1]. □ 

Note that for rf-dimensional states p and a under the restriction p,a > z, their 
trace distance is bounded above by 1 — dz. 



3 Proof 



3. 1 Preliminaries 



The positive part of a self-adjoint operator X is X + := (X+\X\)/2. It features 
in an expression for the trace norm distance between states: 



T{p,v) := -\\p-v\\i = Tr(p-a) + . 



(5) 



The following integral representation of the logarithm is well-known: 
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logx = J ds ( — 



+ S x + s 



x > 0. 



(6) 



By functional calculus, this representation extends to positive operators A > 

as 
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logA= f ds ({1 + s)' 1 ! - (A + s)- 1 ) . 



(7) 



From this follows an integral representation for the derivative of the matrix 
logarithm: for A > 



™ ■= J t 



4=0 



oo 

log(A + tA) = f ds (A + sI) _1 A(i4 + si) -1 . 



Just as the first derivative of the logarithm defines the linear operator T, we 
can also define a quadratic operator TZ via the second derivative [3] . For A > 
and A self-adjoint, 



^< A >=-^ 



log(A + tA) 



4=0 



2 f ds (A + si)' 1 A(A + sI)- x A(^ + si) 



-i 



(9) 



3.2 A technical Proposition 



Proposition 1 Let o be a finite dimensional density matrix with x = A m i n (<r). 
Let A = A + — A_ with Tr A± = 1. Let t be a non-negative number such that 
a + £A is also a density matrix. Then 

TrA7^ +tA (A) < (x + t)- 2 -(l-x-t)- 2 . 



Proof. Denote p = a + tA. 

The first step of the proof is a Fiedler-type argument] 1 1 that TrA7£ p (A) 
achieves its maximal value when A and p commute. By the integral repre- 
sentation (19]), we have 



Tr Aft p (A) = 2 J ds Tr(A(p + s) _1 A(p + s) _1 A(p + s) -1 ) 
o 

oo 

= 2 [ds Tr(A(p + s)- 1 ) 3 . 



Now let C/ be unitary and A = UA U*, and write M = (p + s) -1 . We first 
show that the maximum of Tr(A(p + s) -1 ) 3 over all unitary U is obtained 
when A and p commute. Any unitary matrix U can be written as U = e tx , 
where K is skew-Hermitian. With this parameterisation, 
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t=o y ' dt 



tK a „— tiiT n^rA3 



Tr(AM) 3 = — _ Tr(e tK A e~ tK M) 



4=0 



= 3 Tr( [K, A ]MA MA M) 
= 3Tr(iT((A M) 3 -(MA ) 3 )). 

Any extremal point of Tr(AM) 3 is therefore characterised by the requirement 
that Ty(K((A M) 3 - (MA ) 3 )) = for all skew-Hermitian K. This amounts 
to the equation (AoM) 3 = (MAo) 3 . If we now make the assumption that Ao 
and M are such that AoM has simple eigenvalues (which is true for a dense 
subset), this means that A M = MA , too, i.e. A and M must commute. 

For such an extremal point to be a maximum, an additional condition must 
hold. In a basis in which the eigenvalues of M appear in decreasing order (in 
which both M and A are diagonal) the diagonal elements of A must appear 



1 Named after Fiedler's technique used to prove a well-known result in matrix 
analysis, see e.g. Th. VI. 7.1 in [2]. 



in decreasing order too; recall that the eigenvalues of M = {p+s) x are strictly 
positive for finite s. 

Now this is so independently of the value of s. Therefore, the maximising A for 
the entire integral f£° ds Tr(A(p + s) -1 ) 3 must commute with p, and in a basis 
in which p is diagonal and has its diagonal elements appearing in increasing 
order, the diagonal elements of A must appear in decreasing order. 

Now that the problem has been reduced to the commuting case, we can sim- 
plify Tr Aft p (A) to Tr AV 2 = Tr A^(a + tA + )- 2 -Tr A^(a-tA_)- 2 . Given 
the conditions on A, the ranks of its positive and negative parts must be 
between 1 and d — 1. 

Since A m i n (cr) = x, we have a > x > xA + . Therefore, 

(A;V + £)- 2 < (x + t)- 2 , 

which immediately implies 

TrA 3 + (a + tA + )- 2 <(x + ty 2 . (10) 

For any A > commuting with A_, 

1 = Tr A_ = Tr(A_A- 2/3 A 2/3 ) 

<\\A_A-V%\\A 2 /% /2 
= (TrA 3 _A- 2 ) 1 / 3 (TrA) 2 / 3 , 

so that 

TrA 3 _A- 2 > (TrA)- 2 . 

Recall that the rank of A_ is at most d — 1. Applying this with A equal to 
the restriction of p = a + tA to the support of A_, so that 

d-l 

TrA< ^AJ(cr-tA_) = l-x-t, 

3=1 

we get 

TrA^a-tA-)- 2 ^ (l-x-t)" 2 . (11) 



Combining the two bounds (TTUj) and (TTT1) . we get the bound of the proposi- 
tion. □ 



3.3 Proof of Theorem^ 

The proof of Theorem [T] follows from the inequality of Proposition [TJ using 
three successive integrations. Let A = (p — o~)/T, so that A = A + — A_ and 
A± are density matrices. 

Let us first perform the integration /J dt on each side of the inequality 

TrA7^ +tA (A) < (x + t)- 2 -(l-x-t)- 2 . 
This gives 

TrA(r CT+uA (A)-r CT+ „ A (A)) 

V 

< [ dt{{x + t)- 2 -(l-x- t)~ 2 ) 



= (x + u)- 1 - (x + v)~ l + (1 - x - u)~ l -{1-x- v)' 1 . 
Next, we perform the integration f$ du: 

Tr A(log(or + vA) - log a - %+ v a(vA)) 

V 

< du ((x + u)' 1 - (x + v)' 1 + (1 - x - u)' 1 - (1 - x - w) _1 ) 



X + V V , 1 — X V 



log ; h log ■ 



V X + v 1 — x — v 1 — v — X 

Now the left-hand side can be rewritten as 

— Tr (2a + vA) (log(cr + vA) - log a) . 

dv 

Performing the third and final integration / dv then yields 



Tr(2a + TA)(log(a + TA) - log a) 

T 

f , / X + V v 1 — X 

< dv [ log h log 



X + v I — x — v 1 — v — X 



= (2x + T) log(l + T/x) + (2(1 - x) - T) log(l - T/(l - x)). 
Noting that the left-hand side is just S(p\ \a)—S(o~\\p) completes the proof. □ 
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